Skip to main content

Submission

info

All teams are required to register prior to submission. Early registration will help us to plan the listening panel.

Key Dates including submission deadline.

1. What audio do I need to submit?

Signals should be submitted as 16-bit FLAC files with a 24 kHz or 32 kHz sampling rate depending on the signal (details below), and 0 dB FS corresponds to 100 dB SPL, given the capabilities of the headsets used by the listening panel. See the page on listening tests for more information about reproduction levels from the headset. When playing signals to listeners we will then play them as is. The responsibility for the final signal level is therefore yours. It’s worth bearing in mind that should your signals overall seem too loud to be comfortable to a participant, they may well turn down the volume themselves. Also, there may be clipping in the evaluation block in some tasks if the processed signals are too large.

1.1. Task 1: Headphones

You must submit the following audio for all the signals in the evaluation set:

  • The VDBO (vocal, drums, bass, other) demixed signals for both left and right.
    • Predefined 30-second segment.
    • 16-bit
    • 24 kHz sampling rate
    • Compressed using the lossless FLAC compressor
  • The remixed stereo signal.
    • Predefined 15-second segment.
    • 16-bit
    • 32 kHz sampling rate
    • Compressed using the lossless FLAC compressor

For those replacing the whole enhancement block with another approach, just the left and right output signals from your enhancement processer are required.

1.2. Task 2: Car

You must submit the following audio for all the signals in the evaluation set:

  • The output processed music enhanced by the car stereo.
    • 16-bit
    • 32 kHz sampling rate
    • Compressed using the lossless FLAC compressor

2. Code

We encourage you to make your code open source.

3. Technical report

  • Draft:
    • A draft of the report needs to be uploaded along with your processed signals.
    • The draft needs to be sufficiently complete for us to judge whether your system is compliant with the challenge rules.
  • Technical Report:
    • A two page technical report must be submitted as a paper to the Cadenza-2023 Workshop (date to be confirmed).
    • Your report should include an abstract and introduction and sections on experimental setup/methodology including system information and model/network architecture, evaluation/results, discussion, conclusion and references. Please provide an estimation of the computational resources needed. You must describe any external data and pre-existing tools, software and models used.
    • You can use the Interspeech 2023 template.

4. Where do I submit the signals?

When you have registered you will receive a link to a OneDrive to which you will be able to securely upload your signals.

Materials uploaded will be visible to the Cadenza Team but not to other entrants.

Your processed signals should be named using the conventions used by the baseline system:

  • Task 1: <Listener ID>/<Song Name>/<Listener ID>_<Song Name>_<Channel>_<Stem>.flac and <Listener ID>/<Song Name>/<Listener ID>_<Song Name>_remix.wav as explained in Task1 data page
  • Task 2: <Dataset Split>/<Listener ID>/<Scene_ID>_<Listener ID>_<Song ID>.flac as explained in Task2 data page. These should be placed in a directory whose name is the unique team ID that you will be sent, e.g., submission_E001 and then packaged using zip or tar or any standard packaging tool.

The resulting files should be about 21 GB for Task 1 and 7.3 GB for Task 2.